Frames-based Text Processing

نویسنده

  • Steven Rosenberg
چکیده

As part of a larger project to develop an intelligent noticing system, I am designing a module to process textual material. The essential tasks of a text processor can be divided into two operations: 1) Locating a prior context, called a theme, in the story database in which to place new knowledge. I shall call this process Linking; and 2) Mapping the new information in a sentence into that context. I assume that every new sentence in well written text contains a link to some theme. Goldstein and Roberts (1977) have developed a working frame system (FRL) which forms the basis for our semantics. The frame system is organized as a tree structure, with generic information "bumped" up the tree, while particular frames specify new distinguishing knowledge. The generic knowledge, including procedural information, is inherited automaticly. Each frame consists of a set of slots. A slot is further specified through associated Keys, which can contain procedural knowledge. Input to a preliminary version of this system consists of sentences encoded as deep case frames. We make use of the Frames hierarchy to define a restricted set of ways one frame can refer to another. These are: (A) Direct Reference (B) Generic Reference (C) Contextual Reference (D) Frame reference (E) Default. A) Direct Reference is the use of the name of a Frame to directly evoke it. PARAPHRASES function as alternative labels for the Frame, and are precompiled in the frames under a Name slot. Consequently, successive uses of PARAPHRASES function as discourse links, which directly access the frame. The correctness of such a reference depends on the mapping between the reference frame and the referent which shares a common name. For instance, the frames for "the red dog" and "the other dog" would not match, although the label "dog" is common to both instances of the generic Dog Frame. B) Generic Reference Any two frames will share part of a heritage path in the frame tree descending from the top most node, which will diverge at some point. The common portion defines the semantic match between the two frames. Frames in the branching portions define semantic characteristics which are not shared. If one frame's heritage path contains another frame, the two frames may be generic referents. Thus either more generic or less generic terms can function as referents, (e.g. "gun" vs. "forty-five caliber automatic pistol"). A generic reference involves a direct path between the two frames. If the path is indirect, the two terms will not be co-referential. Both "gun" and "knife" are weapons, but not references. Once again, the mapping determines the appropriateness of the reference. C) Contextual Reference In the context of the following sentences (s1) The report discussed the recent border clash. (S2) The incident was not considered important. the use of the phrase "The incident" in (S2) is an unambiguous reference to the "border clash" of the first sentence. Contextual referents are thus general terms which which describe roles which frames can play. They do not fit into the direct inheritance tree of the frames they can refer to, except in the trivial sense that something is an event, or object. For instance, almost anything could be a "forecast", if it had the right slot values (i.e. had not yet occured.) However, we do not wish to make everything inherit from the forecast frame. The slot values and requirements of the frame can be used to determine whether any recent frames are described by this role, and hence are referents. D) Frame Reference utilizes the empty slots of a frame. In: (S3) John shot his wife. (S4) The gun was a forty-five caliber automatic. The second sentence specifies the instrument required by the action of the first sentence. Frame reference uses the requirements on a slot value to define the lowest possible node in the frame tree from which the slot value must inherit. A demon on the instance slot of this node's frame will examine each new token inheriting from this node, filtering it through the requirements. Such demons are not computationally expensive since they do not need to examine each new input, but are instead automaticly triggered by likely candidates. E) Default I propose as a rule of discourse that if there is no explicit link to a previous theme, the current sentence will discuss the same theme as the preceeding sentence. This is refered to as the DEFAULT option. Once a potential link is found, new information must be mapped into the indicated context. Two of the ways a sentence is related to a prior theme are: (A) Instantiat ing a frame description of a theme. INSTANTIATING a frame involves substituting actual values given in sentences for default values of slots of a theme frame. Uninstantiated slots create EXPECTATIONS. Any expectations associated with a theme become candidates for instantiation by the knowledge in a new sentence linked to that theme. The new knowledge from a current sentence is mapped onto the expectations of the linked theme. Only the small set of expectations associated with the theme linked to the current sentence are ever actively considered at any one time. The function of discourse structure is to limit the number of alternatives. (B) Augmenting expectations Consider a frame for murder evoked by the sentence (55) John murdered his wife. An expectation for an instrumental case exists. Suppose the next sentence had said: (56) He used a blunt instrument. Although the case has not been instantiated, we now know that a semantic feature of the instrument is that it is blunt. This can be encoded in the $Req Key for the Instrument Slot. Our expectation has been AUGMENTED by this new requirment which any new value must meet. If the third sentence said: (S7) A frozen leg of lamb was found next to the corpse. The expectation for the instrument slot can now be instantiated with "leg of lamb", since this fulfills the requirements of the slot. Discourse often gives requirements for the semantic features of an instantiation, which can be used in instantiating that slot. Domain specific knowedge can be used to generate augmentations when the text indicates the appropriate context, although the knowedge in the text itself does not support such an elaborate set of expectations. Empirical Evidence Eight front page articles chosen from the New York Times were examined and the frequency of each type of thematic link was counted. The category of pronoun links was included. Out of a total of 259 sentences, only 68 were linked through pronominal references. Direct Reference links were used in 97 sentences, while 54 sentences were linked by contextual references. Default occured in 40 sentences. No instances of frame reference as a thematic link occured. Three articles were examined more intensively, and their thematic structure charted. An interesting question is whether links between sentences with different themes differ from those used locally within a group of sentences which discuss a single theme. Pronominal reference was never used to link across themes. Direct Reference links in the three articles were used to link between themes about 50% of the time. Contextual referents and Generic referents in those articles were used to link themes •bout a third of the time.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Image processing by alternate dual Gabor frames

‎We present an application of the dual Gabor frames to image‎ ‎processing‎. ‎Our algorithm is based on finding some dual Gabor‎ ‎frame generators which reconstructs accurately the elements of the‎ ‎underlying Hilbert space‎. ‎The advantages of these duals‎ ‎constructed by a polynomial of Gabor frame generators are compared‎ ‎with their canonical dual‎.

متن کامل

Presenting a method for extracting structured domain-dependent information from Farsi Web pages

Extracting structured information about entities from web texts is an important task in web mining, natural language processing, and information extraction. Information extraction is useful in many applications including search engines, question-answering systems, recommender systems, machine translation, etc. An information extraction system aims to identify the entities from the text and extr...

متن کامل

روش جدید متن‌کاوی برای استخراج اطلاعات زمینه کاربر به‌منظور بهبود رتبه‌بندی نتایج موتور جستجو

Today, the importance of text processing and its usages is well known among researchers and students. The amount of textual, documental materials increase day by day. So we need useful ways to save them and retrieve information from these materials. For example, search engines such as Google, Yahoo, Bing and etc. need to read so many web documents and retrieve the most similar ones to the user ...

متن کامل

Corpus Based Enrichment of GermaNet Verb Frames

Lexical semantic resources, like WordNet, are often used in real applications of natural language document processing. For example, we integrated GermaNet in our document suite XDOC. In addition to hypernymy and synonymy relations, we want to exploit GermaNet verb frames for our analysis. In this paper, we outline an approach for the domain related enrichment of GermaNet verb frames by corpus b...

متن کامل

An Empirical Approach to Conceptual Case Frame Acquisition

Conceptual natural language processing systems usually rely on case frame instantiation to recognize events and role objects in text. But generating a good set of case frames for a domain is timeconsuming, tedious, and prone to errors of omission. We have developed a corpus-based algorithm for acquiring conceptual case frames empirically from unannotated text. Our algorithm builds on previous r...

متن کامل

Automatic text regions location in video frames

Content-based information retrieval from digital video databases and media archives is a challenging problem and is rapidly gaining widespread research and commercial interest. For a reliable retrieval and intelligent access to video programs, indexing should provide semantic descriptors. One way to include more semantic knowledge into the indexing process is to use the text embedded within ima...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1977